Biostar

11,668 results • Page 2 of 234

Hi, I would like to rename the sequences in my file to have leading zeros in front of the header name using unix or perl based on the number of sequences

sequence fasta

updated 8.2 years ago • rrsowmya

o alaikum everyone, I am working with multiple genes and in each gene folder i have multiple FASTA (70-75) files and each FASTA file contains single gene sequence. e.g. AMY2b_Gene_folder Chimpanzee_AMY2B_CDS.fasta...gt; AGCTCCCAAGGGATTTGGAGGGGTTCAGGTCTCTCCACCAAATGAAAATGTTGCAATTCACAACCCTTTC I want to change headers of each fasta file according to a specific order given in text file. …

editing FASTA headers text processing command line

updated 6.7 years ago • adeena_hassan

Hello people, I hope you are well. I was wondering if you can help me, I need to batch rename a large amount of RefSeq genome files ".fna" format. Below I show you an example of the file headers: GCA_000007685.1_ASM768v1_genomic.fna...TTGTTTTTACTACTTGGATATATGAAAAAATACTTTGGAACTTGTTTCAAAAGTTAGAATGTGGGGTCTTCTTCAAAAAA My idea is to rename them to look like this: Li_serovar_Copenhageni_Fi…

fasta refseq

updated 14 months ago • BATMAN

I am trying to open the fasta file with sequences. My bio perl script opens the sequence but not with fasta header. How can I get fasta header with bio...perl -w use Bio::SeqIO; $seqio_obj = Bio::SeqIO->new(-file=> "no_plasmid.fasta", -format=>"fasta"); $seq_obj = $seqio_obj->next_seq; $acc = $so->accession_number; while($seq_obj = $seqio_obj->next_s…

RNA-Seq perl bio perl

updated 7.8 years ago • bandanaschapagain

Hi, I have a fasta file with 300 protein sequences. I intend to construct a phylogenetic tree with it. I would want only the accession number...and the organism name in the fasta header and remove the rest of the information. Can anybody suggest how to do this? I have a linux based system with perl...and python installed. For example, i want to convert a header like this: >gi|685204…

sequence edit

updated 8.0 years ago • bkvijay.jayaraman

I have a fasta file with headers that I want to compare with two other text files and if it's present in the first file, put it first on the...to compare the text files to the fasta header and if there's a match, organize/reorder the fasta header file so that that match name is the first entry on the header...look like this. (for the first one since fly is present in file2, place it as the first …

fasta python

updated 18 months ago • jnora0625

Hi, I have 10 fasta files (each file with 20 gene sequences from each of the 10 samples). I would like to create 20 files, specific to each gene...from 10 samples. I proceeded as follows to extract genes with the file_name in header: pyfasta extract --header --fasta test.fasta gene_name1 | awk '/^>/ {$0=$0 "_sample1"}1' > gene_name1.fasta Output: >gene_na…

pyfasta header fasta bash gene

updated 6.7 years ago • bioinfo8

I have fasta file namely `119XCA.fasta` as shown below, >cellulase ATGCTA >gyrase TGATGCT >16s TAGTATG I need to remove all the...fasta headers, keep the sequences one by one and need to write file name as a fasta header. The expected outcome is shown below...TAGTATG I have used the following script `sed '/^>/d' foo.fa > out.fa` which re…

gene sequence genome alignment next-gen

updated 3.6 years ago • Kumar

I have strange fasta headers like this for some good number of sequences, >gi|61221638|sp|P0A366.1| >gi|61221640|sp|P0A368.1|CR1AA_BACTE...I would like to replace the other (`>gi`) in the fasta header to blank or `;`. Can anyone suggest how to do it. I have many such sequences in a big fasta file

awk perl sed unix python

updated 4.7 years ago • empyrean999

Hi, pls, let me know how can i edit the fasta file header. >LR99555.1 Avo, chromosome: 1 I want this header like this. >LR99555.1

linux

updated 2.1 years ago • p

if this has been asked before, but I have a genome assembly file that I just converted from .bam to fasta format in order to start annotation. I would like to run CEGMA on this assembly, because I have concerns about the quality...but the problem is that the default header format when the fasta was created is not acceptable. This is because in the current format here are 5237924 sequences...with …

Assembly

updated 23 months ago • zgayk

I have large fasta file. As you see below there are > sign present in some fasta header like >exon2_ENST00000218032|>exon2_ENST00000218032...gt;exon17_ENST00000253024|>exon17_ENST00000253024 I want to remove the >sign from the header sequence, after remove the header is then look like this >exon2_ENST00000218032|exon2_ENST00000218032 &…

fasta

updated 3.1 years ago • harry

I have downloaded a reference uniprotkb FASTA file. How can I only extract the FASTA headers of each gene (raw-wise) into a CSV file using R

updated 14 months ago • WUSCHEL

I have a FASTA-file like this: >seqA AAAAAAAAAA >seqB AAAAAAAAAA >seqC TTTTTTTTTT >seqD CCCCCCCCCC >seqE CCCCCCCCCC >seqF...AAAAAAAAAA I'm recently learning SeqKit, and I've found that rename can append _N in the header based on the occurrence of the sequence, and also that rmdump can remove…

seqkit fasta

updated 7 weeks ago • Broccoli

converting the file from fastq to fasta SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta") no problem; but now I would like to change the header of the fasta file...m stuck. When I add the `SeqIO.parse` function like this for seq_record in SeqIO.parse(labels[0]+".fasta","fasta"): seq_record.id = labels[0] # renaming the pseudogene with the lab id SeqIO.write(seq_…

biopython processing

updated 3.1 years ago • skbrimer

Hi community, I am not an expert with sed but i want to edit the headers of each sequence in a fasta file. I want to let only the gene id **>NODE_39_length_59461_cov_85.505003_1** The header

edition sed headers command fasta

updated 2.3 years ago • Candela

Hi, I would like to modify the fasta headers from a file. I would like to change: >A0A0F2M4U6|A0A0F2M4U6_SPOSC Endoplasmic reticulum chaperone BiP OS

format header fasta

updated 2.5 years ago • marcus.teixeira

Hello I have a lot of sequences in a FASTA file, and I want to extarct a specific sequence knowing the header ID. for example the header of a sequence is: NODE_19_length_5758_cluster_19_candidate_1...I know that with `grep` I can extract the header, but i want the below sequences to appear on stdout. How can I do this on bash

fasta bash

updated 3.4 years ago • v.berriosfarias

In this example of fasta sequence, you see there is some repeat of fasta sequence many times.for example- exon19_ENST00000194900|exon21_ENST00000194900...exon18_ENST00000194900|exon21_ENST00000194900 So I want to remove all fasta sequence which has the same header in the fasta file and keep only 1 fasta sequnece. I want to remove fasta sequence on...the basis of header not the sequence. Thanks in…

fasta header

updated 3.2 years ago • harry

Hey guys, I have a multi-fasta protein file like this >SF_hydrolase MKG... >LH_reductase MKI... >SM_hydrolase MSN... Basically, I would like to extract...only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on...a list, but I don't know how to extract fasta-headers solely based on o…

Assembly sequencing

updated 5.2 years ago • genomes_and_MGEs

Hello Everyone Can anyone you guide me editing of the fasta header file. My fasta header file shown as below >NP_006556.1 transcriptional repressor CTCF isoform 1 [Homo sapiens

gene

updated 3.6 years ago • bioinformatics.queries

So I have a director full of fasta files and I want to change the fasta header in each one by the name of their corresponding fasta file. For example: HC1993.fa...gt; X58834 CCTGCATCTGCAA HC1993.fa > HC1993 CCTGCATCTGCAA I have about 50 fasta files like that in a directory that I was to do the same thing to. I've been using this sed command for one file that works...sed '…

sequence fasta bash loop sed unix

updated 3.9 years ago • tpaisie

Hey all, I'm working with a lot of data from NCBI and at the moment I'm kind of stuck. I have a ton of fasta files, either containing genomic contigs or the 16S sequences I extracted from those genomes using RNAmmer. The files were automatically downloaded from NCBI and are named like this: GCF_000284355.1_ASM28435v1_genomic.fasta (for contigs) GCF_000284355.1_ASM28435v1_genomic_16S.fasta (for …

genome Assembly sequence

updated 5.1 years ago • Guillaume.Tahon

Hello everybody! I have a fasta file I'm looking to work with in qiime. Unfortunately, it doesn't currently meet their formatting requirements. I need...to change headers like this: >3180275|DCO_MAC_Bv6--LI09_3|40099 XXXXXXXXXXXXXXXXXXXXXX >13488354|DCO_MAC_Bv6--LD09_2_3|2 XXXXXXXXXXXXXXXXXXXXXX...gt;333430241|DCO_MAC_Bv6--LO13_8|1 XXXXXXXXXXXXXXXXXXXXXX To…

awk fasta headers sed

updated 12 months ago • Dani

courtesy) 201200175|A|name1|175|2012 201200287|A|name2|287|2012 201200845|A|name3|845|2012 my fasta file looks like.. >201200175 >201200287 >201200845 I want the output like... >201200175|A|name1|175|2012 >201200287

sequence

updated 4.5 years ago • Shaminur

I have 5000 FASTA sequences with Uniprot ids. Now, I want to add a unique identifier at the beginning of each FASTA header. An example will...And so on I want to add ABC0001 to ABC5000 at the beginning of the fasta header. And the corresponding gene name from my txt file. gopA ABC0001 A12345 gopD ABC0002 B57384 ........................ fotR ABC5000 C12345...And so on As I understand, I …

fasta perl awk

updated 10.5 years ago • bioinfo

Hi all friends, I have a large fasta file that most sequences have a identical header (they differ from the length). I usually extracted the sequences of interest...requires the Biopython library" sys.exit(0) try: fasta_file = sys.argv[1] # Input fasta file wanted_file = sys.argv[2] # Input wanted file, one gene name per line result_file = sys.argv[3] # Output fasta file …

fasta extracting identical header

updated 7.2 years ago • seta

Hello; I need to process fasta header by matching fasta description (not fasta id) with a first column in a another file with two columns and print second...column in file on to fasta header. Here are examples and what i have till now. file1.txt (list file) group_1 gene 1 group_2 gene 2 group_3 gene 3 group_4...my $input; close $infile; { local $/ = undef; …

perl unix fasta

updated 7.8 years ago • empyrean999

Hello everyone, I try to replace the headers A of a FASTA file (file.fasta) with headers B. For this, I have a list which match the headers names. >A_1 >B_1 >A_2 &gt...B_2 >A_3 >B_3 etc... I am using this loop to replace the headers: cat list | while read f ; do echo $f > temp_file A=$(awk '{print $1}…

fasta FASTA sed loop

updated 3.6 years ago • Begonia_pavonina

Hi everyone! I need help with something. I am very new to bioinformatics. I have a fasta file with 32K reference sequences for an X gene. The headers are the Accession numbers, but I need to change them for the...So I think I already did the hardest part) but now I need to combine this information and change de headers of my fasta for the GI of each sequence. I've tried with this script: ``` …

header fasta

updated 21 months ago • marcelavillegasp

I have a fasta file with the following format: >BNY.1.2.t17987.mrna1 CDS=1-1065 seq... How can I remove everything after ".mrna1" from...the headers

fasta RNA-Seq RNA transcriptome

updated 4.3 years ago • 2822462298

As above I have long fasta name file and i want to rename it by just include first and last name like :- >exon9_ENST00000462434:exon25_ENST00000462434

fasta name

updated 3.2 years ago • harry

Hello, I have a list of headers, I need to extract the sequence from the fasta file. how can I do it? kindly let me know. The header file looks like this &gt...gt;TRINITY_DN74659_c0_g1_i1 >TRINITY_DN74659_c0_g1_i1 >TRINITY_DN74698_c0_g1_i1 fasta file looks like this >TRINITY_DN74697_c0_g1_i1 len=243 path=[221:0-242] [-1, 221, -2] GTATGTCCCACCAGACAC…

Fasta

updated 23 months ago • Princy

of interest. I am almost done with the script. But I would also like to include gene names in the fasta headers. By default, it only include corrdinates in fasta headers. Below is my script: >coords=Chr1 1000 2000 forward...gt;TTTGGGGTTATAAATTATTAGAAGTT...... I was wondering if there is a way to include the gene name in fasta header. Thanks, R

pybedtools python

updated 7.2 years ago • RT

I am a newbi for linux stuff... I would like to modify the header of fasta file. **My header is like: >100123_00010T gene=100123_00010** **And, I would like to have headers like "100123_00010

fastfile modification

updated 13 months ago • hellokwmin

Hi, I have a fasta file, which has some same headers like below. They have different sequence but same header. How can I merge them or what...should I do? I want to run orthoMCL but it requires unique headers. ``` >c12358_g1_i9 >c12358_g1_i9

genome sequence

updated 21 months ago • Mehmet

Hello, I am trying to convert my vcf files to fasta. However, after aligning to reference, vcf ID from the header disappears, and bcftools/vcftools are writing only reference...seq name in file header. Like > NC_xxxx.1 Any ideas? I run consensus script like for file in $inpath/*.vcf ; do echo $file bname=$(basename $file) echo...base name is …

consensus

updated 3.6 years ago • storm1907

Does anyone have a handy method for making a fasta header comply with the UniProt header specifications? http://www.uniprot.org/help/fasta-headers In particular, I would

sequence

updated 7.2 years ago • nickp60

Hi, I have protein fasta file whose headers look like '>evm.model.chr.9.52'. There are almost 30k+ proteins. I have performed functional annotations...Now, I al performing some analysis and I want to add atleast protein name or even GO term in fasta header so it would make things alot easier for me. I want something like; >evm.model.chr.9.52 GO:1234678 Can I do it with

protein fasta functional-annotation header

updated 15 months ago • ahmadjoyyia

Hello All, I have a multi fasta file with millions of sequences. I want to duplicate a part of the header and join it to the header itself with a pipe, while...another part (of the header) should be deleted. Let's say I have a fasta file, "input.fasta," which looks like this: >Gene1 wbdfwbf ATGCCGATGCAGTGACG...f 1 < input.fasta > out1.fasta` for deleting spa…

fasta headers duplicate

updated 2.1 years ago • bionix

reference genome sequence using the BWA software, and it gave me a .sam file. I used samtools SAM to FASTA to convert the aligned reads to fasta file. I want to look at assembly statistics and also evaluate completeness with...BUSCO. I received the following error: **The character "/" is present in the fasta header >A00600:204:HFMJ3DSX3:3:1101:3640:1125/1, which will crash Reader. Please…

Fasta BWA BUSCO

updated 18 months ago • hpalk42

Hello, I have a text file with thousands of unique sequences in fasta format. Each read has a header in the following format: 122391_Tcount2352_Acount2352_Bcount0_length293 It's obvious...was used as some point in the pipeline. I'm curious to see if anyone here has encountered this header format before and can tell me which part of the sequence header represents the count of reads. Thanks …

alignment

updated 5.3 years ago • genya35

I have a fasta file with hundreds of sequences and their respective headers. The headers (all of them) are in the format >ABCD [id_123...I have a fasta file with hundreds of sequences and their respective headers. The headers (all of them) are in the format >ABCD [id_123] (gene_XYZ) [protein_ijk] [protein_id=qqq] [123..899] .......seqeunce............ >…

sequence

updated 7.3 years ago • leo1985.arnab

Hello I have a fasta file with sequence headers written as ``` >0|quiver|1..2075|- >0|quiver|2210..3058|- >0|quiver|3112..4169|- ``` and so on till around

sequence fasta

updated 21 months ago • utkarsh.sood

Im wondering about the most straightforward way to extract the interval information contained in a fasta header such as the one below, thanks! Also maybe to pipe into a newly created bed file. >Mouse|chr12:112380949-112381824

fasta header bed interval

updated 6.3 years ago • rbronste

I am sure that someone will do this work faster and better than me. I would like to edit multiple fasta header from this format. >M01380:50:000000000-AV1DH:1:1101:16094:3001 1:N:0:M636:16S_V1V3 TTCTGCCT|0|TAGACCTA|0 CS1_534R_YM3_for...3|27| to this one: >M636 As you can see "M636" is embedded in the mayor header. Thank you for always helping everybody! D

header edition fasta

updated 6.9 years ago • DVR

I want to extract **gene name** , **gene start position** and **gene stop position** from the fasta header of the fasta file. I have tried to extract based on the position but those locations are not consistent. Is there...and 17th element from this list. It works for this particular example. This does not work for other headers where these positions are different. Usually, gene name is consisten…

fasta R string

updated 3.9 years ago • lokraj2003

Hello! I have a FASTA file and I need a script that read in the file, changes all the headers to e new format and writes out all the sequences in...Hello! I have a FASTA file and I need a script that read in the file, changes all the headers to e new format and writes out all the sequences in a new output file. The modified headers should contain, for each sequence, the species name (with "_" r…

FASTA Python script headers

updated 6.4 years ago • mpbiology.dna

How to take a specific column in sequence header identifiers of fasta file? I am having my header such as: ``` >PGM0100236.1 [Candida] scaffold00238 >PGM0100236.1 [Candida...scaffold00241 ``` I would like to take my third column alone i.e scaffold00238 for all the headers in my fasta file. Please give a simple command solution. I am new to bioinfo and linux script. Thank you

Fasta

updated 20 months ago • palani

Dear all, I want to add a special character "/1" to eacf of fasta header (at the end of fasta header) in a 8.5 GB fasta file. I used following command; perl -p -e 's/^(>.*)$/$1-New_Header_info/g' input.fasta

sequencing

updated 9.3 years ago • vahapel

11,668 results • Page 2 of 234

Recent Votes

Answer: Elbow plot question (scRNA seq data analysis - scanpy tutorial)

A: Bcftools merge taking too much time and producing large file

Answer: Kraken2 database

A: Why gene expression data should be log2 transformed?

Answer: Filter Genome for Specific Sites

Answer: How to find SRA sequences of some fungal whole genome sequences if only Biosampl

Recent Locations • All